Understanding the problem of when to scale

This project

  • 250 000 kr

  • 166 hours

  • 1 month

  • Manuscript on pre-print server

  • Produce guidelines

Blåbærkrisa

  • Same data gave different indicator values.
  • Reference values are at county scales (and probably only make sense at that scale)
  • IBECA normalised the variables (with truncation) at plot scale
  • NI normalised the variable (probably) at a regional scale after first taking a mean

It matters

if we aggregate (takes means of) of the scaled or unscaled variabel:

Pathways

Glossery

Scaling = Linear range scaling, usually reducing the range of the variable by using one or two reference value

Truncating = Cutting of value above or below a given threshold. In our case, making the metrices bound between 0 and 1. Truncation is also a type of scaling.

Re-scaling = not used, but often used as a synonym to normalisation.

Normalising = a combination of scaling, followed by truncating (if needed). This lead to a non-linear transformation of the original variable and returns indicators that share the same scale.

Non-linear scaling: Scaling functions like truncation, sigmoid, exponential or break-point types.

Spatially aggregating

condition estimates (bottom) or actual measurements (top)

Commutativty

Early scaling

leads to cummutativity

Early scaling

leads to cummutativity

Also with area weighting

Also with area weighting

Alternativey,

we can aggregate the actual measurements. When would this make sense to do?

Same, but with no area weighting

and taking the sum (e.g. population sizes)

Why do we normalise

notes from Bård

Reference values serve two purposes

  • to enable rescaling to a common measurement scale to facilitate the calculation of the mean.
  • to set a limit for how much one indicator can compensate for other indicators being in a bad state.

As a consequence, an index of rescaled indicators summarizes negative deviations from the reference state over a large set of indicators.

Unscaled states above the reference value are not recognized as being better than the reference state.

Rescaling is useful for making indeces, but why use it for individual indicators? [answer: it depends if we want to aggregate ‘condition estimates’ of actuall measurements]

Arguments for scaling and truncating immediately

  • High or low outliers can not compensate for low or high values (i.e. condition) elsewhere, respectively.
  • Gives higher-resolution indicator values and preserves the original spatial resolution of the variable.
  • Means that subsequent aggregation is carrying information about the actual condition, and not the raw variable.

Arguments for scaling and truncating later at a level where the reference value is sensible

  • The reference value was maybe intended for that scale
  • The indicator uncertainty becomes (apparently) smaller because the mean or sum for a region is less variable.
  • Maybe, too many truncation events leads to accumulated displacement errors?

What if

  • The reference value is at the same scale as the variable
  • The reference value is the same all over
  • We also want to aggregate the original variable further, e.g. to national level - will the indicator and variable remain correlated
  • We want to use non-linear re-scaling functions with asymptotes

Resources

Sanvik 2019